International Conference e-Learning 2014 


TIME-DECAYED USER PROFILE FOR SECOND 
LANGUAGE VOCABULARY LEARNING SYSTEM 


Li Li 1 and Xiao Wei 2 

department of Foreign Languages, Shanghai University of Political Science and Law, Shanghai, China 
2 School of Computer Science and Engineering, Shanghai University, Shanghai, China 


ABSTRACT 

Vocabulary learning is the foundation of second language learning. Many E-learning systems have been developed to 
help learners to learn vocabulary efficiently. Most of these systems employ Ebbinghaus Forgetting Curve to make the 
review schedule for learners. However, learners are different in learning ability and the review schedule based on 
Ebbinghaus Forgetting Curve may be not fit for every learner. To solve the problem, this paper proposes the time- 
decayed user profile (TUP) to store the personalized Forgetting Curves for each learner. First, TUP is defined and then 
two algorithms, TUP Training Data Generation algorithm and TUP updating algorithm, are designed to train TUP. The 
experimental results show that the proposed time-decayed user profile can model the personalized learning characteristics 
of learners accurately. 
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1. INTRODUCTION 

Vocabulary learning is the foundation of second language learning (Horwitz, 1998; Laufer, et al. 2001; Read, 
2007). So many E-learning systems have been developed to help learners to learn vocabulary efficiently. In 
most of these systems, Ebbinghaus Forgetting Curve is employed to make the review schedules for learners 
(Read, 2001). 

German psychologist, H.Ebbinghaus, found that the forgetting started immediately after the learning and 
the process of forgetting is not uniform. At the beginning, the speed of forgetting is high, and then the speed 
of forgetting will reduce. Retention and forgetting are the function of time and the experimental results are 
described as Ebbinghaus Forgetting Curve (Wixted, 1997), as shown in Figure 1. In Figure 1, the abscissa 
axis denotes the elapsed time since learning, the ordinate axis denotes the retention of acquired knowledge, 
and the curve denotes the law of retention after learning. The forgetting curve graph shows that humans tend 
to halve their memory of newly learned knowledge in a matter of days or weeks unless they consciously 
review the learned materials. 

Ebbinghaus Forgetting Curve is built on many learners’ learning process, which is a general law for most 
of humans and doesn’t consider the individual characters of each learner. However, learners are different in 
learning ability because each one has different memory habits, memory modes, and memory characteristics, 
which makes their forgetting curves different (Webb, 2005; Schmitt, 2008). Therefore, the review schedule 
based on Ebbinghaus Forgetting Curve made by e-learning system may be not fit for every learner, which 
will reduce the efficiency of learning. 

To solve the above problem, this paper proposes a personalized user profile, named as time-decayed user 
profile (TUP), for vocabulary E-learning systems, which has the following characteristics: 

(1) Personalization. The individual forgetting curve for each learner will be generated, based on his 
learning process, to personalize the learning ability of different learners. 

(2) Rank vocabulary based on difficulty. The forgetting curve is also different when a learner learns 
words of different difficulty ranks. Generally, the difficult words are much more likely to be forgotten than 
easy words. 
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(3) Dynamically update. The user profile will be updated dynamically in order to catch the learning 
ability of a learner in different periods and status. 

The rest of the paper is organized as follows. Section 2 defines the proposed time-decayed user profile. 
Section 3 discusses how to generate the personalized time -decayed user profile for each learner. Some 
experiments are shown in Section 4 to validate the proposed user profile. Finally, summarizes are made in 
section 5. 


2. TIME-DECAYED USER PROFILE 

The user profile in E-learning system should record the individual characters of learners. In the learning 
process, the retention will decay as time goes on, which is focused on by the proposed user profile. Therefore, 
we name the proposed user profile as Time-decayed User Profile (TUP) 

Generally, the difficult words are much more likely to be forgotten than easy words. The user profile 
should record the abilities of a learner in learning vocabulary of different levels. Therefore how to rank the 
vocabulary for TUP should be discussed firstly. 

2.1 Rank Vocabulary based on Difficulty 

The target vocabulary for learning should be ranked into different difficulty levels. Then we can generate the 
learning tasks or build the forgetting curve for each difficulty level. Furthermore, the learning history of each 
word, such as the learning times, the time of recent learning, and so on, is used to generate the forgetting curve, 
which should be attached to each word in the vocabulary. The vocabulary model satisfying the above 
requirements is defined as following. 

Definition 1: Let V be the vocabulary and the vocabulary model with difficulty ranks for V , denoted 
by R , is defined as 

R = {R l ,R 2 ,..R i ,...,R 1 ,(R ] nR 2 n...nR ii = ®),(R 1 +R 2 +... + R n =V)}, (1) 

where n is the number of difficulty ranks in V . 

There exist many vocabulary difficulty ranks in fact. For example, in China the College English Test 
grades the vocabulary into six difficulty levels from CET1 to CET6. If V is the CET vocabulary, then n 
may be set to 6. 
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Figure 2. TUP Training Process 

In R each R i is a set of words and each word, denoted by w , is defined as 

w={r,lt,et) , (2) 

where r is the rank that the word belongs to. It is the learning times that the user had already done, et is the 
elapsed time since the latest learning on the word. 

2.2 Time-Decayed User Profile 

To personalize the characteristics of a learner in vocabulary learning process, user profile should record the 
personalized forgetting curve for each difficulty rank of vocabulary. The user profile satisfying the above 
requirements is defined as following. 

Definition 2: The time-decayed user profile, denoted by TUP, is defined as 

TUP = {R, fc v fc 2 ,..„ fc i ,...,fc n } , (3) 

where R is the vocabulary model with difficulty ranks as defined in (1) , R records the learning status of each 
word as defined in (2). fc, records the forgetting curve of the vocabulary of the i rank, which is defined as 

fc = { {et, ret ) 1 0 < et < 3 1, 0 < ret < 100} , (4) 

where et is the elapsed time since learning, ret is retention at the time et , which is calculated by 

original learning - relearning . 

ret = — 2 * 2.x 100. (5) 

original learning 

In Equation (5), original learning is the number of words in vocabulary learning task, relearning refers to 
the number of words that are forgotten and need to be reviewed, and the ret is the final score of retention at 
the time et. 

According to Ebbinghaus Forgetting Curve, when the elapsed time since learning is bigger than a month, 
the retention words will be remembered for a long time. Therefore it is enough to set the upper bound of 
et as 31 days, et can also be a decimal to record the elapsed time less than a whole day, for example, 
et = 0.5 means the elapsed time is 12 hours. 
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For a new learner, the values of fc can be initialized according to Ebbinghaus Forgetting Curve. To a 
specific elapsed time et , the corresponding ret can be initialized as 

ret = e ~ et ' r , ( 6 ) 

where r is the difficulty rank of vocabulary. Another way is to initialize the forgetting curve based on the 
average of other learners’ forgetting curves. 


3. TIME-DECAYED USER PROFILE TRAINING 


The training of TUP is the process to acquire the characteristics of a learner in vocabulary learning process. 
The acquisition is a dynamic process and the atom unit of acquisition is based on each elapsed time on each 
difficult rank. The general process of training the user profile is shown in Figure 2, which includes the 
following three steps. 


3.1 Generate Vocabulary Learning Task 


Vocabulary learning tasks are groups of words to test the mastery level of a learner. According to the 
definition of TUP, the learning task should be able to test the mastery of words belonging to different 
difficulty rank at different elapsed time. The learning task is defined as 


T = 


T T 

1 u ••• 

... T 


31 


‘,J 


T T 

± n.\ ■■■ ± n, 31 


(7) 


where each T t] is an atom learning task to test the mastery level of words belonging to difficulty rank r, at the 
elapsed time j . 

The process of generating vocabulary learning task is described in Algorithm 1 . 


Algorithm 1: TUP Training Data Generation 

Description: Generate the vocabulary learning tasks for a learner to 
collect training data for Algorithm 2 to train the time-decayed user profile 
for the learner. 

Input: R , the vocabulary with difficult ranks 

m, the number of words in each subtask. 

Output: Learning Task, T 

1. For r = 1 to n 

2. For et = 1 to 31 

3. While k < m 

4. Select a word w from R randomly 

5. If (R.w.r = r and R.w.et = et and w <£ Tij) 

6. Tjj=TijKj R.w 

7. Endif 

8. Endwhile 

9. If T,j .lengthen Then 

10. Select (m- Tj . length ) words from R, -Ty 

11. Endif 

12. Endfor 

13. Endfor 
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3.2 Get the Feedback from E-learning System 


Vocabulary E-learning system interacts with learners, performs the vocabulary testing tasks, tests the mastery 
level of each testing subtask in (7), and generates test results for the user profile updating in the next step. 

Because the design of E-learning system is not the main work of this paper, here we just give the 
definition of test results, which is denoted as 


TR = 


Kl - ^ 1,31 

tr. , 


Ka - Km 


( 8 ) 


where each tr t J is the result of learning task T,- , and the value of tr,j is calculated by (5). 


3.3 Update the User Profile 

Based on the test results of E-learning system, the forgetting curves of each difficulty rank can be generated 
and used to update the user profile, which is shown in algorithm 2. 

In algorithm 2, ooin test results means that there is no result for the test subtask, the values of fc is 
initialized according to (6). 


Algorithm 2: TUP Updating Algorithm 

Description : Generate the new forgetting curves to update the TUP 
according to the test results from E-learning system. 

Input: TUP 

Test Results, TR 
Output: TUP ( Updated ) 

1. For r = 1 to n 

2. For et = 1 to 31 

3. If ( tr r e ,= oo ) Then 

4. ret = e~ ilr 

5. Else 

6. ret = tr ret 

7. If ret>FCfcr.ret\ et -i Then 

8. ret = FC.fcr.ret\ e ,_i 

9. Endif 

10. Endif 

11. FC.fc r = FC.fc r u (et,ret) 

12. Endfor 

13. Endfor 


4. EXPERIMENTS 

Three experiments are shown in this section to validate the effectiveness of the proposed user profile. 

Data Set: In these experiments, we use College English vocabulary as the data set. College English is a 
second language course in Chinese university, which is ranked from level 1 to 6. The vocabulary is also 
divided into 6 ranks. So the difficult rank of the vocabulary in the experiments is 6. 

Experiment Participants: We select 60 none-English major students from different departments in a 
university. Among these students, 20 students are freshmen whose English level is level 1-2, another 20 
students are sophomores whose English level is level 3-4, the last 20 students are juniors whose English 
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level is 5~6. Also students are selected according to their scores in English course to ensure each group 
consists of students with different levels: good level, middle level, and low level. 

All 60 students are divided into two groups randomly: one is experimental group, the other is control 
group. Each group has 30 students. 

4.1 Experiment on evaluating the Accuracy of TUP 

Experimental Goal: 

This experiment is designed to validate the accuracy of TUP. 

The accuracy of TUP refers to the degree that the forgetting curves fit the actual situations of the learners. 
We first generate a learning/test task for each learner according to the forgetting curves in his TUP. The 
learning/test task consists of the words are just forgotten according to the forgetting curves. If the test result 
shows that the learner really forgets these words, and then the forgetting curve is accurate, otherwise it is not. 
The total accuracy of a forgetting curve is the average of each time on the curves. 

Experimental Process: 

(1) Make a study plan for duration of 31 days, the duration of learning time in each day is 30 minutes. 

(2) Select a student from the experimental group to participate in the learning process according to the 
study plan. During the learning process, the user profile of the student is trained. At last, the user 
profile has forgetting curves with the length of 31 days. 

(3) Generate test tasks for the student according to the forgetting curves in his user profile. 

(4) The student does the test and the score of each test is recorded to calculate the accuracy. 

(5) Repeat the steps from (2) to (4) for each student in the experimental group and consequently we get 
30 results. 

(6) Calculate the average accuracy of the forgetting curve of each difficult rank based on the above 30 
results 

(7) Calculate the total average accuracy of all the average values in step (6). 

Experimental Results: 

The experimental results are shown in Figure 3. In the figure each line is the accuracy of a forgetting 
curve belonging to a difficult rank. For example, R1 refers to the accuracy of the 1 st rank of the vocabulary 
(The words of College English level 1). Each point on the line is the average of the accuracy of the results of 
30 students. The total average of the accuracy is 0.852. 

4.2 Experiment on comparing the Accuracy between TUP and UP 
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Figure 3. The Accuracy of TUP 
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Figure 4. The accuracy comparison between TUP and UP (Ebbinghaus Forgetting Curve) 

Experimental Goal: 

The goal of this experiment is to compare the accuracy of the proposed TUP with the UP that uses 
Ebbinghaus Forgetting Curve. 

Experimental Process: 

The process of the experiment is the same as the first experiment. The participants of the experiment are 
the students of control group. 

Experimental Results: 

The experimental results are shown by the line UP in Figure 4, which is the average of all students of 
control group. The average result of Experiment 1 is also shown by the line TUP in Figure 4 for comparing 
with UP. 

The results show that TUP (the time-decayed user profile) does better than UP(the user profile using 
Ebbinghaus Forgetting Curve), which also means that TUP can be used to make the review plan in 
vocabulary e-learning system to improve the SLVL efficiently. 

4.3 Experiment on the Effectiveness in improving Vocabulary Learning 

Experimental Goal: 

The goal of this experiment is to compare the effectiveness between TUP and UP in improving 
vocabulary learning. 

Experimental Process: 

(1) Make a study plan for duration of 31 days and the duration of learning time in each day is 30 minutes. 

(2) All the students of the experimental group do the learning with the help of TUP. At the same time, all 
the students of the control group do the learning with the help of UP (the same as Experiment 2) 

(3) After the plan is completed, each student will be tested with the purpose of how they have mastered 
these respectively words. 

(4) Calculate the average of students from different grades (freshmen, sophomore, and junior) in each 
group (the experimental group and the control group) respectively. 

Experimental Results: 

The result shown in Figure 5 declares that the TUP is more effective in improving vocabulary learning 
compared with UP. 
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Figure 5. The effectiveness comparison between TUP and UP in improving vocabulary learning 


5. CONCLUSION 

Vocabulary learning is the foundation of second language learning, and a rich vocabulary makes the skill of 
listening, speaking, writing and oral easier to perform. Many E-learning systems have been developed to help 
leaners to learn vocabulary efficiently. In most of these systems, Ebbinghaus Forgetting Curve is employed to 
make the review schedule for learners. However, learners have difference in learning ability, so the review 
schedule based on Ebbinghaus Forgetting Curve may not be fit for every learner. To solve the problem, this 
paper proposed the time-decayed user profile (TUP) to store the personalized Forgetting Curves for each 
learner. First TUP is defined and then two algorithms, TUP Training Data Generation algorithm and TUP 
updating algorithm, are proposed to train TUP. The experimental results show that the proposed time-decayed 
user profile can model the personalized learning characteristics of learners accurately. 

The future work of this paper includes the following: 

(1) Develop an e-learning system with user-friendly interface to support larger experiments. 

(2) Validate the proposed user profile in larger experiments which means more participants and longer 
experimental period. 
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